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APPLICATION  OF  THE  CONDITIONAL  POPULATION-MIXTURE  MODEL  TO  IMAGE  SEGMENTATION 


Stanley  L.  Sc love 


University  of  Illinois  at  Chicago  Circle 
Chicago,  Illinois 


Abstract 

The  problem  of  image  segmentation  is  consid¬ 
ered  in  the  context  of  a  mixture  of  probability 
distributions.  A  modification  of  the  usual 
approach  to  mixtures  of  distributions  is  employ¬ 
ed.  Parametric  families  of  distributions  are 
considered,  a  set  of  parameter  values  being 
associated  with  each  distribution.  In  addition, 
an  identification  parameter  is  associated  with 
each  observation,  indicating  from  which  distri¬ 
bution  the  observation  arose.  Thus,  the  segmen¬ 
tation  problem  is  cast  as  a  problem  of  statisti¬ 
cal  estimation.  Segmentation  algorithms  are 
obtained  by  applying  a  method  of  iterated  maximum 
likelihood  to  the  resulting  likelihood  function.. 


1.  Introduction 


Consider  a  digital  image,  given  as  a  set  of 


p-dimensional  vectors  x.  =  (x 

j 


‘nr 


X  .  ,  )  , 

Pij 


i  =  1,2,. ..,1,  j  =  1,2 . J. 

Examples .  (i)  p  =  3,  x  .  =  red  level, 

H  J 

green  level,  x^, ,  -  blue  level  of  pixel 
oi  j 

(ii)  p  =  1  (monochromatic  image),  x  . . 

xi  j 


2i  j 

( i  )  • 


ij 


gray  level  of  pixel  (i,j). 

The  problem  of  image  segmentation  is,  simply 
stated,  the  problem  of  putting  the  pixels  ( i , j ) 
into  groups  (classes,  clusters),  i.e.,  the  "seg¬ 
ments.  " 

Define  a  segmenting  as  a  partition  of  the 
set  of  pixels,  i.e.,  as  a  collection  {C^,...,C^} 


of  disjoint  sets  such  that  each  pixel  belongs 
to  one  and  only  one  set  C  .  Each  set  C  is  a 
segment  (cluster) .  Here  §e  shall  assumi  that 
the  integer  k  is  specified  in  advance.  (A  modi¬ 
fication  of  the  algorithm  allows  some  of  the 
segments  to  join  or  split,  thereby  permitting 
fewer  or  more  than  k  clusters  to  form.  See  Sec. 
6.2  below. ) 

In  what  follows  we  shall  write  x^^  rather 


than  x  ,  using  a  single  subscript  i  rather  than 
^  J 

the  double  subscript  ij  for  the  pixels,  even 
though  they  are  a  two-dimensional  array. 

It  seems  reasonable  to  consider  the  follow¬ 
ing  model  for  segmentation  problems: 

Assumption  (l).  With  the  g-th  segment  (g=lr..,k) 
is  associated  a  probability  distribution  with 
probability  density  function  (p.d.f.)  h^(x). 

The  p.d.f.'s  are  generally  unknown. 


Assumption  (2).  With  the  i-th  pixel  (i=l,...,n= 

IJ )  is  associated  a  group  (segment)  identification 
parameter)  y^  which  is  equal  to  g  if  and  only  if 

pixel  i  belongs  to  segment  g.  Each  pixel  thus 
gives  rise  to  a  pair  (X_,y)  where  )(  is  observable 
and  y  is  not. 

Remarks .  (i)  In  the  context  of  this  model  "seg¬ 

mentation  "is  merely  estimation  of  the  parameters 
Y^  for  the  n  pixels.  (ii)  In  regard  to  Assumption 

(l),  when  we  are  working  with  some  parametric 
family,  indexed  by  a  parameter,  say  6_,  then  h^ 

takes  the  form  h  (x.)  =  h(xj£  ).  The  parameters 
g  8 

are  generally  unknown.  (iii)  This  model  is  a 
population-mixture  model . 

It  is  convenient  to  reparametrize.  Replace 
Y.  by  a  k-vector  9.  which  consists  of  k-1  zeros 

i  *  —i 

and  a  single  1,  the  position  of  the  1  indicating 
which  segment  pixel  i  belongs  to;  i.e.,  6^  has  a 

1  as  its  Y^-th  element  and  O’s  elsewhere.  The 

p.d.f.  of  X.,  given  9.,  is 


where  0  .  is  the  g-th  element  of  9, . 
gi  -i 


2.  The  Probability  Model 


The  model  of  Sec.  1  should  be  contrasted  with 
the  usual  population -mixture  model,  in  which  any 
observation  X.  is  chosen  from  Population  g  with 


probability  tt  ,  so  that  in  this  standard  popula¬ 
tion-mixture  fiodel  the  p.d.f.  of  X,  is 


=  5=iV6(^k  (2a) 

isl,...,n.  This  standard  mixture  model  has  been 
used  for  pixel  classification;  see,  e.g.,  Eklundh, 
Yamamoto,  and  Rosenfeld.5  The  purpose  of  the  pre¬ 
sent  paper  is  to  suggest  the  conditional  mixture 
model  as  an  alternative  and  to  present  some  algo¬ 
rithms  derived  from  it.  Further  discussion  of  the 
model,  in  the  context  of  statistical  cluster  analy¬ 
sis,  and  further  references  are  given  by  Sclove.^ 

A  likelihood  approach,  whether  based  on  the 
standard  or  the  conditional  mixture  model,  is 
illuminating  in  that  it  can  show  how  ad  hoc  opti¬ 
mality  criteria  (objective  functions)  which  have 
been  proposed  relate  to  likelihood  functions  in 
particular  probability  models. 

Note  that  (l.l)  can  be  written  as  a  product 


A 


2 


fOLjV  =  gl*  (.2.2) 

The  form  (2.2)  is  often  more  convenient,  and  we 
shall  use  it  in  what  follows. 

3.  The  Segmentation  Algorithm 

Using  the  form  (2.2),  one  sees  that  the  joint 
p.d.f.  of  the  X^s,  given  the  0^*s,  is 

n  k  0  . 

n  II  [h(x.  ;8  )]  81 . 
i=l  g=l 

The  likelihood  is  to  be  maximised  over  all  assign¬ 
ments  of  pixels  to  segments  and  over  all  permis¬ 
sible  parameter  values.  Many  ad  hoc  schemes  can 
be  applied  to  this  maximization  problem.  E.g., 
one  way  to  maximize  is  to  start  with  a  given  seg¬ 
mentation,  take  each  observation  successively  and 
shift  it  to  the  first  segment  for  which  a  shift 
results  in  an  increase  in  likelihood,  and  loop 
through  the  data  unitl  no  pixel  changes  segments. 

The  algorithm  to  be  developed  here  is  an 
iterative,  back-and-forth  procedure.  We  first 
maximize  with  respect  to  (w.r.t.)  the  0's  (hold¬ 
ing  the  6's  fixed  at  initial  values) ,  then  w.r .t. 
the  s  "(holding  the  8’s  fixed  at  the  values  ob¬ 
tained  in  the  previous  stage ),  then  again  w.r.t. 
the  0's  (holding  the  3/ s  fixed  at  the  values  ob¬ 
tained  in  the  previous  stage),  etc.  We  stop  when 
no  0  changes,  i.e.,  when  no  pixel  changes  seg¬ 
ments,  or  when  a  specified  amount  of  computer 
time  is  used. 

An  alternative  of  starting  the  procedure  is 
to  start  with  an  initial  segmentation  rather  than 
with  initial  guesses  of  the  8/s. 

It  is  clear  that,  for  fixed  values  of  the 


8_' s,  say  0/s,  the  likelihood  is  maximized,  for 
each  i ,  by  taking 

max  fh(x. ) } 

KKk  -1  * 


9 


gi 


(3.1) 


1  if  h(x.;B  ) 

-is 

0  otherwise  . 

(In  case  of  ties  an  arbitrary  choice  is  made.)  In 
other  words,  segmentation  proceeds  by  allocating 
pixel  i  to  the  group  g  for  which  the  estimated 
probability  density  of  the  observation  x.  is 
largest. 

Note  that,  having  tentatively  estimated  the 
0 1 s  at  any  stage,  i.e.,  having  tentatively  segmen¬ 
ted  the  image,  estimation  of  the  8/s  is  reduced 
simply  to  ordinary  maximum  likelihood  estimation 
in  the  particular  parametric  family  at  hand. 

This  is  a  particular  advantage  of  this  approach. 


Vs 

B(S) 


and  B  the  set  of 
Let  L(B,T)  denote  the  likelihood.  Let 


Let  T  denote  the  set  of  0. 

1 


denote  the  value  of  B  which  maximizes  L  at 

( s ) 

the  s-th  stage  of  the  iteration,  and  let  T 
denote  the  value  of  T  which  maximizes  L  at  the 

( s ) 

s-th  stage  of  the  iteration.  Then  T  maximizes 

L(B^S|  T)  w.r.t.  T,  and  B^3^  maximizes  L(B,T^S  ^) 
w.r.t.  B.  This  back-and-forth  maximization  is  an 
example  of  the  relaxation  method  (Southwell  f8 
method);  see  Ortega  &  Bheinboldt^  (pp.  2lUff.)  and 
0  10 

Southwell.  *  It  is  true  that 


l(b(s+1!t(s)) 


L(B(s!t(s)) 


L(B^S1t^S+1))  >_  L(B(s1  t(s))  . 

That  is,  at  no  stage  of  the  procedure  can  the 
value  of  the  likelihood  decrease;  however,  there 
is  no  guarantee  of  convergence  to  the  global  maxi¬ 
mum  (neither  do  alternative  clustering  algorithms 
guarantee  convergence  to  the  global  max  of  their 
objective  functions).  To  see  how  the  procedure 
can  fail  to  converge  to  a  global  max,  suppose  it 

happens  that  L(BV  ,  Tv  )  >  L(B,TV  ')  for  all  B, 

or  L(B^S!  T^S-1L  >  L(B  ,T)  for  all  T.  Then  the 
procedure  will  terminate  at  the  s-th  stage,  with¬ 
out  having  necessarily  reached  the  global  max. 

That  is,  if,  having  maximized  w.r.t.  one  of  the 
variables  B  and  T,  we  happen  to  find  ourselves  at 
a  (relative)  max  w.r.t.  the  other,  we  may  not 
reach  a  global  max. 

jr.  Application  to  Particular  Distributions 

Now  we  consider  application  of  this  general 
method  to  particular  families  of  distributions. 
First  we  consider  normal  distributions  with  common 
covariance  matrix,  for  in  this  case  it  becomes 
clear  how  the  model  establishes  a  link  with  some 
existing  clustering  procedures. 

U.l.  Multivariate  Normal  Populations  with  Common 
Covariance  Matrix 

In  the  case  of  normal  populations  with  means 
V  ,  g=l , . . . ,k ,  and  common  covariance  matrix  the 

likelihood  takes  the  form 

(27r)~np^?|lJ~n/^exp[-  l  j  0  ^(x.  ;jj  ,£)/2], 
i=l  g=l  g  ^ 
where  the  quadratic  form  q  is  given  by 

q(x;£»z0  =  (x-lj.)  » 

the  (Mahalanobis )  distance  between  x.  and  y_  in  the 
metric  of  Here  (3.1)  is  equivalent  to 

A  1  if  q(x.  ;u  ,l)  -  min  (q(x.  ;ib,  ,1)  > 

0  =  -i  -g  -  !<£<*  “  (I*.!) 

0  otherwise. 

That  is,  pixel  i  is  assigned  to  that  group  to 
whose  tentatively  estimated  mean  vector  it  is  clo¬ 
sest,  where  the  distance  is  in  the  metric  of  the 
tentatively  estimated  covariance  matrix.  Having 
estimated  the  0's,  we  have  multivariate  normal 
observations  arranged  into  groups;  maximization 
w.r.t.  the  s  and  is  accomplished  by  taking 
the  group  mean  vectors  as  estimates  for  the  ^ s , 
and  the  within-groups  sum-of-products  matrix  gives 
the  estimate  of  The  procedure  is  iterated: 

using  new  estimates  g  =  l,...,k,  and  the 

rule  (l*.l)  is  applied  again.  Then  new  u/s  and  a 

new  are  calculated;  etc.  The  Mahalanobis  dis¬ 
tances  can  be  computed  efficiently;  see,  e.g., 

Anderson1,  p.  107. 

Relationship  with  the  isodata  procedure.  This 
scheme  is  a  Mahalanobis -distance  version  of  Ball 

2 

and  Hallfs  isodata  clustering  procedure.  Isodata 
proceeds  as  follows.  One  starts  with  tentative 
estimates  of  cluster  means  and  assigns  each  indi¬ 
vidual  to  the  mean  to  which  he  is  closest.  (The 
isodata  scheme  uses  Euclidean  distance,  or 


and 


3 


modified  Euclidean  distance  in  which  different 
weights  are  assigned  to  the  p  dimensions.)  The 
cluster  means  are  then  re-estimated,  and  one 
loops  through  the  data  again,  reassigning  the 
individuals,  etc*  Note  the  similarity  to  our 
scheme:  We  start  with  tentative  estimates  of 
the  u_* s  and  and  assign  each  individual  to  the 
mean  to  which  he  is  closest,  using  Mahalanobis 
distance  in  the  metric  of  the  tentatively  esti¬ 
mated  covariance  matrix.  The  p' s  and  are  then 
re-estimated,  the  individuals  (pixels)  are  re¬ 
allocated  to  clusters  (segments),  etc. 

An  important  difference  is  that  our  scheme 
employs  Manalanobis  distance  rather  than  Euclid¬ 
ean  or  weighted -Euclidean  distance.  (It  is 
worth  emphasizing  that  it  is  the  Mahalanobis 
distance  based  on  the  within-gvoups  sum-of- 
products  matrix  that  arises  here;  some  data 
analysts  use  the  total  sum-of -products  matrix,^ 
which  is  not  appropriate;  see,  e.g.,  Chernoff.  ) 

Some  experiments  with  the  algorithm,  in  the 
context  of  statistical  cluster  analysis,  are 

g 

reported  in  Sclove. 

Relationship  with  the  k-means  procedure. 
Arranging  the  computation  a  little  differently, 
updating  the  estimates  of  the  and  Z_  after 
each  individual  pixel  is  assigned  rather  than 
waiting  until  all  have  been  assigned,  produces  a 
Mahalanobis-di stance  version  of  MacQueenfs 
k-means  procedure. 

U.2.  Multivariate  Normal  Populations  with 
Different  Covariance  Matrices 

The  algorithm  generated  for  this  case  turns 
out  not  to  be  simply  to  use  a  different  Manalano¬ 
bis  distance  for  each  cluster.  (The  complication 
which  occurs  is  analogous  to  that  in  classifica¬ 
tion — discriminant  analysis — where  one  is  led  to 
quadratic  discriminant  functions  if  the  covariance 
matrices  differ.)  The  likelihood  is 
—0  /2 

(2ir)-np/2nn  \Z  I  gi  exp[-JT0  .  q(x.  >1^  )/2  ] . 

ig  ^  Si  -i  ^  -s 

Equation  (3.1)  becomes 

1  if  setting  I=g  maximizes 

0gi  =  |^r1/2exp[-q(2Ci;jit,r.i  )/2]  {U  .2) 

0  otherwise  . 

Maximizing  the  expression  in  (U.2)  is  equivalent 
to  minimizing  „  A 

*nlLj  +  <j(xpvV  •  4 

It  has  been  noted  [see,  e.g.,  Day  ]  that  in 
the  standard  mixture  model  for  this  case  the  sup- 
remum  of  the  likelihood  is  infinity.  This  is 
reflected  in  the  fact  that  in  our  algorithm  it 
would  be  possible  that  at  some  stage  one  of  the 
clusters  would  consist  of  a  single  individual, 
so  that  the  tentative  estimate  of  the  mean  of 
that  group  would  be  the  vector  of  observations 
for  that  individual,  and  the  tentative  estimate 
of  the  covariance  matrix  of  that  cluster  would 
be  undefined.  It  is  also  possible  for  the  obser¬ 
vations  in  a  given  cluster  to  be  very  close  to 
lying  on  a  lower-dimensional  subspace,  so  that 
the  tentative  estimate  of  the  covariance  matrix 
could  have  an  arbitrarily  small  determinant,  and 
the  maximized  likelihood  could  be  arbitrarily 
large,  for  the  contribution  of  Group  g  to  the 


maximized  likelihood  is  inversely  proportional  to 
a  positive  power  of  the  determinant. 

5.  Comparison  with  the  Method 
Based  on  the  Standard  Mixture  Model 

Wolfe  has  considered  clustering  based  on  the 
standard  mixture  model. H  Under  that  model  the 
posterior  probability  that  Individual  i  belong  to 
Group  g  is  k 

"g  hfx^J^,)/  l  h(x1;6t)  .  (5.1) 

If  we  can  obtain  estimates  for  8  ,  it  ,  g  = 

6  § 

they  can  be  substituted  to  provide  an  estimate  of 

(5.1),  .  k  „ 

V •  (5-2) 

Individual  i  is  assigned  to  that  group  g  for  which 
the  estimated  posterior  probability  of  group  mem¬ 
bership,  (5.2),  is  largest.  On  the  other  hand, 
with  the  conditional  mixture  model  Individual  i 
is  assigned  to  that  group  g  for  which  the  estima¬ 
ted  density  h(x.;B  )  is  largest. 

Wolfe  has  provided  computer  programs  for  the 
case  of  normal  distributions.  As  is  well  known, 
the  maximum  likelihood  equations  for  mixture  prob¬ 
lems  are  messy.  He  solves  them  by  a  multivariate 
Newton-Raphson  method  of  iterative  solution.  This 
involves  the  assignment  of  arbitrary  initial  values 
to  start  the  iterative  solution,  as  does  the  gene¬ 
ral  method  described  here. 

6.  Some  Remarks  on  Statistical  Inference 

The  maximum  likelihood  estimate  of  (B,T)  is 

the  value  (B,T)  for  which  the  likelihood  L  is 

largest.  The  quantity  L(B,T)  is  the  corresponding 

maximum  value  of  the  likelil  od.  To  approximate 

(B,T)  one  uses  the  algorithm.  Let  A(B,T)  = 

L(B,T)/L(B,T) .  Let  F  denote  the  large  sample 
c.d.f.  of  -2  In  A,  i.e.,  lim  Pr[-2In  A(B,T)<x] 

n-voo  ’  — 

=  F(x).  Suppose  that  F  is  independent  of  the  true 
values  (B,T).  E.g.,  it  may  be  the  c.d.f.  of  a  chi- 

square  distribution  with  an  appropriate  number  of 
degrees  of  freedom;  it  is  necessary  to  investigate 
the  extent  to  which  the  large  sample  theory  of  the 
generalized  likelihood  ratio  applies  when  there 
are  incidental  parameters. 

6.1.  Confidence  Sets 

Let  x  denote  the  upper  a-th  percentage  point 

of  F.  Then  l-o  =  F(x  )  *  Pr[-2  £n  A(B,T)  <  x  ] 
a  ,  A  —  a 

=  Pr[-2  InL(B,T)  <_  x^  +  2  InL(B,T)],  so  that 

{ (B,T) :  -2  InL(B,T)  <_  x  +2  In(B,T)}  is  an  ap¬ 
proximate  lQO(l-a)#  confidence  set  for  (B,T). 

Denote  by  (B,T)  the  estimates  produced  by  the 

algorithm.  Then  L(B,T)  <^L(B,T).  Thus  a  conser¬ 
vative  confidence  set— one  that  contains  more 
values  of  (B,T)  than  the  true  confidence  set  and 
has  confidence  coefficient  at  least  1-a  —  is 

{ (B,T) :  -2  In  L(B,T)  2  In  L(B,T) }  . 


h 


6.2 .  Some  Remarks  on  Choice  of  k 

The  algorithm  can  be  run  with  different 
choices  of  k  and  the  results  compared.  Note 
that  the  likelihood  function  is  a  different 
function  for  different  values  of  k.  Denote 
this  dependence  upon  k  by  writing  the  likelihood 

as  L,  (B,  ,T,  ).  Let  B,  ,  T,  denote  the  maximum  like- 
k  k  k  k  k 

li'nood  estimates  for  fixed  k.  Following  Wolfe’s 
approach  for  the  standard  mixture  model,  one 
might  make  a  sequence  of  hypothesis  tests  to 


As  a  corollary  to  this  assumption,  it  follows  that 
the  0^'s  are  functionally  related,  in  as  much  as 

each  6^  must  be  equal  to  one  of  its  eight  neigh¬ 
bors.  It  would  be  interesting  to  study  the  prob¬ 
lem  resulting  from  maximizing  the  likelihood  func¬ 
tion  under  this  condition.  Alternatively,  if  the 
fh’s  are  then  treated  as  random,  they  would  be 

a  two-dimensional  Markov  process.  It  will  be 
interesting  to  study  the  problem  of  estimating 
them  in  this  model. 


decide  on  k,  first  comparing  L^fB^T^)  with 
^3(63^3)*  then  if  necessary  comparing  L^(B^,T^) 

with  ) ,  etc.  Wolfe  uses  the  asymptotic 

chi-square  distribution  of  the  generalized  likeli¬ 
hood  ratio  here;  even  in  the  context  of  the  stan¬ 
dard  mixture  model  this  may  not  be  the  asymptotic 
distribution. 

An  alternative  approach  to  choice  of  k  is  to 
follow  MacQueen's  suggestion  of  introducing  re¬ 
finement  and  coarsening  parameters  R  and  C  such 
that  two  clusters  join  when  their  mean  vectors 
are  less  than  R  units  apart  and  a  cluster  splits 
when  its  diameter  exceeds  C.^ 


7.  Conclusions 


A  modification  of  the  usual  mixture  model  has 
been  employed  to  provide  a  probability  framework 
for  clustering/segmentation  problems.  A  general 
method  of  producing  algorithms  which  correspond 
to  a  method  of  iterated  maximum  likelihood  has 
been  given.  The  general  method  given  here  is 
plausible,  is  linked  to  a  probability  model,  and 
is  easy  to  program.  In  the  case  of  multivariate 
normal  distributions  with  common  covariance  matrix 
the  general  method  produces  schemes  which  can  be 
viewed  as  improved  versions  of  some  existing 
schemes. 

The  focus  here  has  been  on  the  parametric 
case,  but  the  methods  discussed  might  be  applied 
to  the  nonparametric  case  by  estimating  the 
p.d.f’s  h  (x_)  as  the  clustering  proceeds,  using 


standard  methods  of  density  estimation. 

Algorithms  based  on  a  likelihood  function  are 
based  on  the  raw  data  matrix,  in  contrast  to  many 
clustering  procedures  which  are  based  on  a  matrix 
of  pairwise  similarities  or  distances.  The  latter 
procedures  have  the  advantage  of  applicability  to 
problems  where  a  raw  data  matrix  is  not  available. 
When  the  raw  data  are  available,  such  algorithms 
have  the  theoretical  disadvantage  of  not  extrac¬ 
ting  all  the  information  from  the  observations 
and  the  computational  disadvantage  of  preliminary 
computation  of  all  the  pairwise  distances  (or 
similarities ) . 

Alternative  models  for  image  segmentation. 

The  focus  here  has  been  on  a  model  in  which  the 
segment-identification  (pixel-classification) 
parameters  9^  are  treated  as  functionally  indepen¬ 
dent.  In  the  standard  mixture  model  they  become 
random  variables  and  are  treated  as  statistically 
independent.  To  Assumptions  (l)  and  (2)  of  Sec.  1 
it  seems  reasonable  to  add 

Assumption  (3).  Each  segment  consists  of  more 
than  one  pixel. 


Acknowledgements 

This  report  was  prepared  under  Office  of 

Naval  Research  Contract  N000lli-80-C-0t08 ,  Task 

NR01j2-4U3. 

Bibliography 

[1]  T.  W.  Anderson.  An  Introduction  to  Multi - 
variate  Statistical  Analysis .  New  York, 

Wiley,  1958. 

[2]  G.  H.  Ball  &  D.  J.  Hall,  "A  clustering  tech¬ 
nique  for  summarizing  multivariate  data. 
Behavioral  Sci vol.  12,  153-155,  1967- 

[3]  H.  Chernoff,  "Metric  considerations  in  cluster 
analysis,"  Proc.  6th  Berkeley  Syrnp.  Math. 
Statist.  Prob.,  vol.  1.  Los  Angeles  and 
Berkeley:  Univ .  of  Calif.  Press,  621-629,1970. 

[U]  N.  E.  Pay,  "Estimating  the  components  of  a 
mixture  of  normal  distributions,"  Biometrika , 
vol.  56,  163-^75,  1969. 

[5]  J.  0.  Eklundh,  H.  Yamamoto  8c  A.  Rosenfeld, 

"A  relaxation  method  for  mult  1  spectral  pixel 
classification,"  IEEE  Trans.  Pattern!  Analysis 
and  Machine  Intelligence,  vol.  PAMI-2,  72-75, 
1980. 

[6]  J.  MacQueen,  "Some  methods  for  classification 
and  analysis  of  multivariate  observations," 
Proa,  bth  Berkeley  Syrnp.  Math.  Statist.  Prob ., 
vol.  1.  Los  Angeles  and  Berkeley:  Univ.  of 
Calif.  Press,  281-297,  1966. 

[7]  J.  Ortega  &  W.  Rheinboldt.  Iterative  Solution 
of  Nonlinear  Equations  in  Several  Variables. 
New  York:  Academic  Press,  1970. 

[8]  S.  L.  Sclove,  "Population  mixture  models  and 
clustering  algorithms,"  Cormunications  in 
Statistics  (A),  vol.  A6,  1417-^,  1977. 

[9]  R.  Southwell.  Relaxation  Methods  in  Engineer¬ 
ing  Science:  a  Treatise  on  Approximate  Compu¬ 
tation.  London:  Oxford  Univ.  Press,  19^0. 

[10]  R.  Southwell.  Relaxation  Methods  in  Theoreti¬ 
cal  Physics .  London  and  New  York:  Oxford 
Univ.  Press  (Clarendon),  19^6. 

[11]  J.  H.  Wolfe,  "Pattern  clustering  by  multivari¬ 
ate  mixture  analysis,"  Multivariate  Behavioral 
Research,  vol.  5,  329-350,  1970. 


SECURITY  CLASSl'lCATiOM  O'  This  PACK  ("h**  Of  £*ffd) 


|  REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

1 

1 

4*  TITLl  MMMrt*)' 

Application  of  the  Conditional  Population -Mixture 
Model  to  Image  Segmentation 

9.  TYRE  O'  RERORT  A  RERIOO  COVER  SO 

Technical  Report 

a  performing  oro.  report  numoer 

7.  AUTHORS 

Stanley  L.  Sc love 

1.  CONTRACT  OR  GRANT  NUMOCRfaJ 

N000lU-80-C-0U08  ^ 

_ ! _ 

».  PERFORMING  ORGANIZATION  NAME  ANO  AOORESS 

Department  of  Mathematics  x/ 

University  of  Illinois  at  Chicago  Circle 

Box  1*8U8.  Chicago r  IL  60680 

n.  controlling  o"ict  NAMI  ano  adores* 

11.  REPORT  OATS 

August  15,  1980 

11.  MUMRER  O'  RAGES 

h 

14.  MONITORING  AGENCY  NAME  a  AOONESV"  dlliirmtt  from  Control Un$  OtH cc) 

Office  of  Naval  Research 

Statistics  and  Probability  Branch 

Arlington,  VA 

IS.  SECURITY  CLASS,  (ml  fhf  topott) 

Unclassified 

IS*.  DECLASSI'IC ATI 0n/00WN GRADING 
SCHEDULE 

Approved  for  public  release;  distribution  unlimited 

IT.  OtST RlEUTtOM  STATEMENT  (of  (At  •fcitrael  onforotf  In  Slock  24,  If  tflffovont  from  RtfoN) 

Unlimited  distribution 

19.  KEY  WOROS  (CMflnwt  on  rorcroo  aide  if  nocccoocy  and  Identity  ky  block  nwoMrJ 

Image  processing,  pattern  recognition,  pixel  classification;  Nurtures  of 
distributions ,  cluster  analysis,  isodata  procedure,  k-means  procedure, 
Mahalanobis  distance,  multivariate  analysis;  relaxation  methods. 

tation  is  considered  in  the  context  of  a  mixture  of  probability  ..distributions. 

A  modification  of  the  usual  approach  to  mixtures  of  distributions  is  employed. 
Parametric  families  of  distributions  are  considered,  a  set  of  parameter  values 
being  associated  with  each  distribution.  In  addition,  an  identification  'parameter 
is  associated  with  each  observation,  indicating  from  which  distribution  tlbe ■ 
observation  arose.  Thus,  the  segmentation  problem  is  cast  as  a  problem  or  statis¬ 
tical  estimation.  Segmentation  algorithms  are  obtained  by  applying  a  method  of 
iterated  maximum  likelihood  to  the  resulting  likelihood  function. 

DD  i  iSSn  1473  toinon  or  i  *ov ••  is omolete  Unclassified 


stcmoTT  cuBBcHiS  or  rms  WSTjlB  BSftwf 

5 


